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Description 

Background and Summary of the Invention 

[0001] The present invention relates to speech analysis, and more particularly to a computer-implemented natural 
language parser. 

[0002] Understanding the meaning of a natural language sentence is the cornerstone of many fields of science with 
implications as broad ranging as from how humans interact with computers or machines to how they can interact with 
other intelligent agents, human or machine, through translation systems. The task becomes more complicated when 
the sentence is obtained using an automatic speech recognition system (ASR), where recognition errors such as in- 
sertions, omissions or substitutions can render the sentence less intelligible even to a human. Additional user-interface 
related factors might also introduce an element of un-naturalness to the speakers own utterance, so that the recognized 
sentence may contain the affects of a user's hesitations; pauses, repetitions, broken phrase or sentences. 
[0003] Due to these factors, parsing natural language sentences occupies an important area in computer-implement- 
ed speech related systems. However, current approaches for natural language parsers typically experience relatively 
sub-optimal robustness in handling the afore mentioned errors of an automatic speech recognition system. 
[0004] In "A modular approach to spoken language translation for large domains", M. Woszczyna, et al. Proceedings 
of AMTA-1998 18-31 October 1998 pages 1 - 10, there is disclosed a machine translation system specifically suited 
for spoken dialogue in which language is characterized by highly disfluent utterances which can be fragmented and 
ungrammatical, and in which a lattice of parse trees that contain all possible domain actions is created, where a domain 
action can include an operation such as requesting information or giving information and consists of three represen- 
tational levels: consistency of a speech act, concepts, and arguments. 

[0005] The present invention overcomes the afore mentioned disadvantages as well as other disadvantages. 
[0006] In accordance with the teachings of the present invention, a computer-implemented speech parsing method 
and apparatus for processing an input phrase is provided. The method and apparatus include providing a plurality of 
grammars that are indicative of predetermined topics. A plurality of parse forests are generated using the grammars, 
and tags are associated with words in the input phrase using the generated parse forests. Scores are generated for 
tags based upon attributes of the parse forests, and tags are selected for use as a parsed representation of the input 
phrase based upon the generated scores. 

[0007] For a more complete understanding of the invention, its objects and advantages, reference should be made 
to the following specification and to the accompanying drawings. 

Brief Description of the Drawings 

[0008] 

Figure 1 is a block diagram depicting the computer-implemented components utilized to effect a dialog between 
at least two people with different languages. 

Figure 2 is a block diagram depicting the components of the system of Figure 1 in more detail; 

Figure 3 is a tag generation diagram depicting the application of the semantic tag generation process to an input 

sentence; 

Figure 4 is a block diagram depicting the components of the local parser of the present invention; 

Figure 5 is a tag generation diagram depicting the application of the semantic tag generation process to an input 

sentence; 

Figure 6 is a parse tree diagram depicting a model for a parse tree for an input sentence; 

Figure 7 is a parse tree diagram depicting multiple tags being generated as candidates during intermediate stages 

of local parsing; 

Figure 8 is a process diagram depicting the output at different intervals for the present invention; 

Figure 9 is a computer screen display of an exemplary cost grammar; 

Figure 10 is a computer screen display of a parse forest generated for an input sentence; 

Figure 11 is a graphical parse forest showing a partial representation in a graphical format of the parse forest in 

Figure 10; 

Figure 12 is a flow chart depicting the operational steps associated with the present invention being utilized in an 
exemplary application; and 

Figure 13 is a flow chart depicting the operational steps associated with processing an input sentence using the 
local parser of the present invention. 
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Description of the Preferred Embodiment 

[0009] Figure 1 depicts a computer-implemented dialog continuous speech processing system for allowing two peo- 
ple who speak different languages to effectively communicate. In the non-limiting example of Figure 1, a buyer 20 
wishes to communicate with salesperson 22 in order to purchase a piece of merchandise. The difficulty arises in that 
buyer 20 speaks only English while salesperson 22 speaks only Japanese. 

[0010] The dialog speech processing system 24 of the present invention uses a speech recognizer 26 to transform 
the English speech of buyer 20 into a string of words. The string of words is read as text by a speech understanding 
module 28 which extracts the semantic components of the string. 

[0011] A dialog manager 30 determines whether a sufficient amount of information has been provided by buyer 20 
based upon the semantic components determined by speech understanding module 28. If a sufficient amount of in- 
formation has been provided, dialog manager 30 allows translation module 32 to translate the buyers speech from the 
determined semantic components to Japanese. Translation module 32 translates the semantic components into Jap- 
anese and performs speech synthesis via computer response module 42 in order to vocalize the Japanese translation 
for salesperson 22 to hear. 

[0012] Salesperson 22 then utilizes the dialog speech processing system 24 to respond to buyer 20. Accordingly, a 
Japanese speech recognizer 36 and Japanese speech understanding module 38 respectively perform speech recog- 
nition of the speech of salesperson 22 if insufficient information has been provided by salesperson 22. 
[001 3] If dialog manager 30 determines that an insufficient amount of information has been provided by buyer 20 for 
accomplishing a predetermined goal (such as purchasing a piece of merchandise), dialog manager 30 instructs a 
computer response module 34 to vocalize a response which will ask buyer 20 to provide the missing piece(s) of infor- 
mation. 

[001 4] The preferred embodiment is suitable for implementation in a hand-held computer device 43 where the device 
is a tool allowing the user to formulate his or her request in the target language. Such a portable hand-held device is 
well suited for making a ticket/hotel reservation in a foreign country, purchasing a piece of merchandise, performing 
location directory assistance, or exchanging money. The preferred embodiment allows the user to switch from one task 
to another by selecting on the hand-held device which task they would like to perform. In an alternate embodiment, a 
flash memory card which is unique to each task can be provided so that a user can switch from one task to another. 
[0015] Figure 2 depicts components of the dialog speech processing system 24 in more detail. In particular, speech 
understanding module 28 includes a local parser 60 to identify predetermined relevant task-related fragments. Speech 
understanding module 28 also includes a global parser 62 to extract the overall semantics of the buyer's request. 
[001 6] The novel local parser 60 utilizes in the preferred embodiment small and multiple grammars along with several 
passes and an unique scoring mechanism to provide parse hypotheses. For example, the novel local parser recognizes 
according to this approach phrases such as dates, names of cities, and prices. If a speaker utters "get me a flight to 
Boston on January 23rd which also serves lunch", the local parser recognizes: "Boston" as a city name; "January 23rd" 
as a date; and "lunch" as being about a meal. The global parser assembles those items (city name, date, etc.) together 
and recognizes that the speaker wishes to take an airplane ride with certain constraints. 

[001 7] Speech understanding module 28 includes knowledge database 63 which encodes the semantics of a domain 
(i.e., goal to be achieved). In this sense, knowledge database 63 is preferably a domain-specific database as depicted 
by reference numeral 65 and is used by dialog manager 30 to determine whether a particular action related to achieving 
a predetermined goal is possible. 

[0018] The preferred embodiment encodes the semantics via a frame data structure 64. The frame data structure 
64 contains empty slots 66 which are filled when the semantic interpretation of global parser 62 matches the frame. 
For example, a frame data structure (whose domain is purchasing merchandise) includes an empty slot for specifying 
the buyer-requested price for the merchandise. If buyer 20 has provided the price, then that empty slot is filled with 
that information. However, if that particular frame needs to be filled after the buyer has initially provided its request, 
then dialog manager 30 instructs computer response module 34 to ask buyer 20 to provide a desired price. 
[0019] Preferably, computer response module 34 is multi-modal in being able to provide a response to a user via 
speech synthesis, text or graphical. For example, if the user has requested directions to a particular location, the 
computer response could display a graphical map with the terms on the map being translated by translation module 
40. Moreover, computer response module 40 can speak the directions to the user through audio part 68. However, it 
is to be understood that the present invention is not limited to having all three modes present as it can contain one or 
more of the modes of the computer response module 34. 

[0020] Audio part 68 uses the semantics that have been recognized to generate a sentence in the buyer's target 
language based on the semantic concept. This generation process preferably uses a paired dictionary of sentences 
in both the initial and target language. In an alternate embodiment, sentences are automatically generated based on 
per type sentences which have been constructed from the slots available in a semantic frame. 
[0021] The frame data structure 64 preferably includes multiple frames which each in turn have multiple slots. One 
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frame may have slots directed to attributes of a shirt, such as, color, size, and prices. Another frame may have slots 
directed to attributes associated with the location to which the shirt is to be sent, such as, name, address, phone number. 
[0022] The following reference discusses global parsers and frames: R. Kuhn and R. D. Mori, Spoken Dialogues 
with Computers (Chapter 14: Sentence Interpretation), Academic Press, Boston (1998). 

[0023] Dialog manager 30 uses dialog history data file 67 to assist in filling in empty slots before asking the speaker 
for the information. Dialog history data file 67 contains a log of the conversation which has occurred through the device 
of the present invention. For example, if a speaker utters "get me a flight to Boston on January 23rd which also serves 
lunch", the dialog manager 30 examines the dialog history data file 67 to check what city names the speaker may have 
mentioned in a previous dialog exchange. If the speaker had mentioned that he was calling from Detroit, then the dialog 
manager 30 fills the empty slot of the source city with the city name of "Detroit". If a sufficient number of slots have 
been filled, then the present invention will ask the speaker to verify and confirm the flight plan. Thus, if any assumptions 
made by the dialog manager 30 through the use of dialog history data file 67 prove to be incorrect, then the speaker 
can correct the assumption. 

[0024] In another alternate embodiment computer response module 34 is instructed by dialog manager 30 to perform 
a search on the remote database 70 in order to provide buyer 20 with information about that piece of merchandise. In 
this non-limiting example, dialog manager 30 can instruct computer response module 34 to search the store's remote 
database 70 for the price range of the merchandise for which the buyer 20 is interested. The alternate embodiment 
substantially improves the quality of the dialog between buyer 20 and salesperson 22 by providing information to buyer 
20 so that buyer 20 can formulate a more informed request to salesperson 22. 

[0025] Dialog manager 30 assumes an integral role in the dialog by performing a back-and-forth dialog with buyer 
20 before buyer 20 communicates with salesperson 22. In such a role, dialog manager 30 using the teachings of the 
present invention is able to effectively manage the turn-taking aspect of a human-like back-and-forth dialog. Dialog 
manager 30 is able to make its own decision about which direction the dialog with buyer 20 will take next and when to 
initiate when a new direction will be taken. 

[0026] For example, if buyer 20 has requested a certain type of shirt within a specified price range, dialog manager 
30 determines whether such a shirt is available within that price range. Such a determination is made via remote 
database 70. In this example, dialog manager 30 determines that such a shirt is not available in the buyer's price range, 
however, another type of shirt is available in that price range. Thus, dialog manager 30 can determine whether a 
particular action or goal of the buyer is feasible and assist the buyer to accomplish that goal. 
[0027] The present invention analyzes and extracts semantically important and meaningful topics from a loosely 
structured, natural language text which may have been generated as the output of an automatic speech recognition 
system (ASR) used by a dialogue or speech understanding system. The present invention translates the natural lan- 
guage text input to a new representation by generating well-structured tags containing topic information and data, and 
associating each tag with the segments of the input text containing the tagged information. In an alternate embodiment, 
tags are generated as a separate list, or as a semantic frame. 

[0028] Figure 3 depicts a non-limiting example of the role of the local parser of the present invention in a speech 
understanding system such as, in an automated online travel reservation specialist with a speech interface. The fol- 
lowing topics can be potential targets for the present invention: flight arrival and departure times, and dates possibly 
with ranges and constraints; city-names involved in the flight; fare/cost information involving currency amounts; class 
of seats; meal information; flight-numbers; names of airlines; the stop-overs of the flight, etc. 
[0029] The example includes a possible input sentence 100 as generated from a continuous speech recognition 
system and containing recognition mistakes. The corresponding output 102 is a possible interpretation by the present 
invention where three tags have been generated, one corresponding to city-names 104, one to time 106, and one to 
date 108. 

[0030] Robustness is a feature of the present invention as the input can contain grammatically incorrect English 
sentences, such as in the example above, due to the following reasons: the input to the recognizer is casual, dialog 
style, natural speech and can contain broken sentences, partial phrases; the speech recognition may introduce inser- 
tion, omission, or mis-recognition errors even when the speech input is considered correct. The present invention deals 
robustly with all types of input and extracts as much information as possible. 

[0031] Figure 4 depicts the different components of the novel local parser 60 of the present invention. The present 
invention preferably utilizes generalized parsing techniques in a multi-pass approach as a fixed-point computation. 
Each topic is described as a context-sensitive LR (left-right and rightmost derivation) grammar, allowing ambiguities. 
The following are references related to context-sensitive LR grammars: A. Aho and J. D. Ullman, Principles of Compiler 
Design, Addison Wesley Publishing Co., Reading, Massachusetts (1977); and N. Tomita, Generalized LR Parsing, 
Kluwer Academic Publishers, Boston, Massachusetts (1 991 ). 

[0032] At each pass of the computation, a generalized parsing algorithm is used to generate preferably all possible 
(both complete and partial) parse trees independently for each targeted topic. Each pass potentially generates several 
alternative parse-trees, each parse-tree representing a possibly different interpretation of a particular topic. The multiple 
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passes through preferably parallel and independent paths result In a substantial elimination of ambiguities and overlap 
among different topics. The present invention is a systematic way of scoring all possible parse-trees so that the (N) 
best candidates are selected utilizing the contextual information present in the system. 

[0033] Local parsing system 60 is carried out in three stages: lexical analysis 120; parallel parse-forest generation 
for each topic (for example, generators 130 and 132); and analysis and synthesis of parsed components as shown 
generally by reference numeral 134. The preferred embodiment depicts the structure for the inputs to and outputs from 
the local parser in Exhibit A below. 

Lexical analysis: 

[0034] A speaker utters a phrase that is recognized by an automatic speech recognizer 117 which generates input 
sentence 118. Lexical analysis stage 120 identifies and generates tags for the topics (which do not require extensive 
grammars) in input sentence 118 using lexical filters 126 and 128. These include, for example, city-names; class of 
seats; meal information; names of airlines; and information about stop-overs. A regular-expression scan of the input 
sentence 118 using the keywords involved in the mentioned exemplary tags is typically sufficient at this level. Also, 
performed at this stage is the tagging of words in the input sentence that are not part of the lexicon of particular grammar. 
These words are indicated using an X-tag so that such noise words are replaced with the letter "X". 

Parallel parse-forest generation: 

[0035] The present invention uses a high-level general parsing strategy to describe and parse each topic separately, 
and generates tags and maps them to the input stream. Due to the nature of unstructured input text 1 1 8, each individual 
topic parser preferably accepts as large a language as possible, ignoring all but important words, dealing with insertion 
and deletion errors. The parsing of each topic involves designing context-sensitive grammar rules using a meta-level 
specification language, much like the ones used in LR parsing. Examples of grammars include grammar A 140 and 
grammar B 142. Using the present invention's approach, topic grammars 140 and 142 are described as if they were 
an LR-type grammar, containing redundancies and without eliminating shift and reduce conflicts. The result of parsing 
an input sentence is all possible parses based on the grammar specifications. 

[0036] Generators 1 30 and 1 32 generate parse forests 1 50 and 1 52 for their topics. Tag-generation is done by syn- 
thesizing actual information found in the parse tree obtained during parsing. 

[0037] Figure 4 depicts tag generation via tag and score generators 160 and 162 which respectively generate tags 
164 and 166. Each identified tag also carries information about what set of input words in the input sentence are 
covered by the tag. Subsequently the tag replaces its cover-set. In the preferred embodiment, context information 167 
is utilized for tag and score generations, such as by generators 160 and 162. Context information 167 is utilized in the 
scoring heuristics for adjusting weights associated with a heuristic scoring factor technique that is discussed below. 
Context information 167 preferably includes word confidence vector 168 and dialogue context weights 169. However, 
it should be understood that the present invention is not limited to using both word confidence vector 1 68 and dialogue 
context weights 1 69, but also includes using one to the exclusion of the other, as well as not utilizing context information 
167 within the present invention. 

[0038] Automatic speech recognition process block 117 generates word confidence vector 168 which indicates how 
well the words in input sentence 1 1 8 were recognized. Dialog manager 30 generates dialogue context weights 1 69 by 
determining the state of the dialogue. For example, dialog manager 30 asks a user about a particular topic, such as, 
what departure time is preferable. Due to this request dialog manager 30 determines that the state of the dialogue is 
time-oriented. Dialog manager 30 provides dialogue context weights 169 in order to inform the proper processes to 
more heavily weight the detected time-oriented words. 

Synthesis of Tag-components: 

[0039] The topic spotting parser of the previous stage generates a significant amount of information that needs to 
be analyzed and combined together to form the final output of the local parser. The present invention is preferably as 
"aggressive" as possible in spotting each topic resulting in the generation of multiple tag candidates. Additionally in 
the presence of numbers or certain key-words, such as "between", "before", "and", "or", "around", etc., and especially 
if these words have been introduced or dropped due to recognition errors it is possible to construct many alternative 
tag candidates. For example, the input sentence 220 in Figure 5 could have been a result of insertion or deletion errors. 
The combining phase of the present invention determines which tags form a more meaningful interpretation of the 
input. The present invention defines heuristics and makes a selection based on them using a N-Best candidate selection 
process. Each generated tag corresponds to a set of words in the input word string, called the tag's cover-set. 
[0040] A heuristic is used that takes into account the cover-sets of the tags used to generate a score. The score 
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roughly depends on the size of the cover-set, the sizes in the number of the words of the gaps within the covered items, 
and the weights assigned to the presence of certain keywords. In the preferred embodiment, ASR-derived confidence 
vector and dialog context information are utilized to assign priorities to the tags. For example applying cost-tags parsing 
first potentially removes cost-related numbers that are easier to identify uniquely from the input stream, and leaves 
fewer numbers to create ambiguities with other tags. Preferably, dialog context information is used to adjust the prior- 
ities. 

N-Best Candidates Selection 

[0041] With reference back to Figure 4, at the end of each pass, an N-best processor 170 selects the N-best candi- 
dates based upon the scores associated with the tags and generates the topic-tags, each representing the information 
found in the corresponding parse-tree. Once topics have been discovered this way, the corresponding words in the 
input can be substituted with the tag information. This substitution transformation eliminates the corresponding words 
from the current input text. The output 180 of each pass is fed-back to the next pass as the new input, since the 
substitutions may help in the elimination of certain ambiguities among competing grammars or help generate better 
parse-trees by filtering out overlapping symbols. 

[0042] Computation ceases when no additional tags are generated in the last pass. The output of the final pass 
becomes the output of the local parser to global parser 62. Since each phase can only reduce the number of words in 
its input and the length of the input text is finite, the number of passes in the fixed-point computation is linearly bounded 
by the size of its input. 

[0043] The following novel scoring factors are used to rank the alternative parse trees based on the following at- 
tributes of a parse-tree: 

Number of terminal symbols. 
Number or non-terminal symbols. 
The depth of the parse-tree. 
• The size of the gaps in the terminal symbols. 

ASR-Confidence measures associated with each terminal symbol. 
Context-adjustable weights associated with each terminal and non-terminal symbol. 

Each path preferably corresponds to a separate topic that can be developed independently, operating on a small 
amount of data, in a computationally inexpensive way. The architecture of the present invention is flexible and modular 
so incorporating additional paths and grammars, for new topics, or changing heuristics for particular topics is straight 
forward, this also allows developing reusable components that can be shared among different systems easily. 
[0044] Figure 6 provides a non-limiting depiction of a tree in relation to a discussion to the tag scoring heuristics. 
Figure 6 depicts an input string 250 and a sample parse-tree 252. The parse-tree rooted as St 254 identifies the sub- 
sequence {w3, w4, w7, w8, w1 0}, as a possible parse. This parse has 5 terminal symbols {w3, w4, w7, w8, w1 0}, with 
gaps between w4 & w7 (size=2) and between w8 and w1 0 (size= 1 ), or total gapsize of 3. Parse tree 252 has four non- 
terminals: St 254, NT a 256, NT b 258, and NT C 260. The depth of parse tree 252 is three due to the traversal from St 
254 to NT f 258 to NT a 256 to W3. 
[0045] A possible score for this parse is: 

#Terminals*10 - (GapSize*1.5) -Depth 
+ #Non-terminals = 50-4.5-3+4 = 46.5 

[0046] The present invention also includes utilizing non-uniform weights which can be assigned to the non-terminal 
and terminal nodes. Also, confidence measures are preferably utilized to adjust the weights of one or more of the 
scoring factors. For example, a likelihood ratio algorithm can be utilized to compute confidence scores (see, for exam- 
ple, the following reference: R. Sukkar and Chin-Hui Lee, Vocabulary Independent Discriminative Utterance Verification 
for Non-Key Word Rejection in Sub-Word Based Speech Recognition, IEEE Transactions on Speech and Audio 
Processing, Vol. 4, No. 6, pages 420-29 (1996)). 

[0047] Figure 7 provides another non-limiting depiction of a tree in relation to a discussion to the scoring heuristics. 
Five parse trees are shown at reference numerals 270, 272, 274, 276, and 278. With respect to the five possible parse 
trees and corresponding tags shown in Figure 7, the following scoring approach is used: 
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[0048] The score based system results in Tag#5 being picked as the best candidate for Date targets, this selection 
eliminates Tag#2 and Tag#4 from further consideration due to the overlap with Tag#1 lexicon. This leaves the parse 
for Tag#1 as the next best parse, so Tags #5 and #1 are selected. 
15 [0049] The present invention utilizes multiple-passes as illustrated in Figure 8. At reference numeral 290, outputs of 
the present invention are depicted at different passes in processing input sentence 294. Parse tree forest 296 is gen- 
erated during the first pass and helps to generate first pass output 298. First pass output 298 has associated time tag 
300 with the words of input sentence 294 "Five thirty pm". 

[0050] First pass output 298 is used as an input for a second pass processing of input sentence 294. Parse forest 
20 302 is generated during processing of the second pass and results in a cost tag 304 being generated. In one embod- 
iment of the present invention, the reason why the first pass processing did not parse the hundred dollars part of input 
sentence 294 is due to N-best tag selection and combining block 170 of Figure 4. During the first phase, due to lexical 
filtering and aggressive parsing, the best cost parse is "five hundred dollars", and the best time parse is parse tree 
forest 296 for "after five thirty p-m". Since the word "five" is shared, the selection process invalidates the best cost 
25 parse and generates the time tag for "five thirty p-m". However, the end of the second pass results in a filtered string 
308 which generates the cost tag 304 successfully. 

Grammar 

30 [0051] In the preferred embodiment, each topic is expressed as a generalized LR(0) grammar using the following 
syntax: 

TopicGrammar = Rule+. 
Rule = "*" ID "." I 
ID "." I 

35 ID "=" ID* ("I" ID*)* "."I 

ID ";" ID+ 

[0052] The grammar syntax informally states that the grammar is expressed as a series of grammar rules, where 
each grammar rule either describes a context-sensitive substitution rule for a terminal or a non-terminal grammar 
symbol. 

40 [0053] Figure 9 depicts an exemplary grammar for parsing the cost involving dollar or yen amounts. The first rule <* 
COST> 320 declares the root non-terminal symbol to be COST. 

[0054] Each subsequent rule of the form <A = X Y Z.> specifies a non-terminal symbol, A, and a substitution rule 
where the symbol A can be substituted in a rightmost derivation by the three right hand side grammar symbols, X Y 
Z, each of which is either a terminal or non-terminal symbol. For example, the rule 324: 
45 C_Gen = C_Num I CJMum C_Currency. 

defines Cj3en as a non-terminal that can be reduced with either a number (C_Num) or a number followed by a currency 
symbol (C_Currency). Terminal symbols are defined using the <t: s1 s2> For example, the rule 328: 
c„yen: yen yens. 

defines c_yen to be a terminal symbol matching the "yen" or "yens" as a next token in the input stream. 
50 [0055] The cost grammar matches all words that are not defined as terminals under the X rule. A lexical filter is used 
to convert all input words that are not relevant to COST rules with the word "x". Accordingly, the X rule matches one 
or more consecutive "x"s. 

[0056] Figure 10 shows a non-limiting example of parsing the sentence 400: "flights under five hundred dollars." 
Each line represents the application of a grammar rule, for example at reference numeral 404: 
55 C_Tens_2_3=c_num_2_3. 

represents a node in the parse forest where the grammar symbol C_Tens covers the range [2-3], i.e. the word "five". 
Similarly line 408: 

c„qualifier_1_2: "under". 
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represents terminal symbol c_qualifier matching the range [1-2], i.e. the word "under". The root symbol rule 412, 
COST_0_5 covers the entire range signalling a successful parse which yielded a unique parse for the entire input. 
Other root symbols rules are depicted which have their own parse trees shown in Figure 10. For example, Figure 10 
depicts a parse tree for root symbol 437. If multiple parses are used, a rule contains the alternatives shown with "I" s. 
Also notice in a non-limiting manner how the first word, "flights", is skipped over by the X rule. 
[0057] Figure 11 shows a partial graphical tree depiction of the data of Figure 10. For example, root symbol rule is 
depicted by reference numeral 412 on Figure 11. 

Tag Generation: 

[0058] The preferred tag generation method uses a parse forest, and generates the tags as specified by the output 
specifications. The tag generation algorithm (called reduce) uses a synthesis and inheritance approach in order to 
construct each tag utilizing the information found in the parse tree (note: the usage of the name "reduce" herein is 
separate and different from the term reduce (as in shift/reduce actions) used in the LR parsing literature). The reduce 
algorithm used by the tag generation method operates as follows: 
Input: node: aJJ (any node in the parse forest.) 

1. If a_/J'is a terminal rule, return the right-hand-side (which is a token at the input stream at position /) either 
unchanged, or by assigning it a meaning - for example applying a conversion from ascii to numeric for a digit, etc.,) 

2. remove all the X-rules from the right-hand-side, yielding a rule of the form 

ct_U= fW 0 J o P/_U/ - f> k J k J k . 

where p * X. 

3. Evaluate new attribute, a, for a_/_yby concatenating results from reducing the terms on the right-hand-side, i.e.: 

aJJ.a = 2 j=o k reduce(p,_/V7,.) 

where: 2 is a concatenation operator. 

4. Inherit all the attributes from each reduced term on the right-hand-side. 
for each term, p^ijj) in the right-hand-side 

for each attribute, (j> e ^piJ^AttrUst 
add <j> to the node's attribute list: 

aJJ.AttrList U = <J) 

inherit the attribute value: 



5. If necessary generate new attributes for aJJ possibly utilizing the inherited and computed attributes. All new 
attributes are inherited by the parent nodes, all the way up to the root node. This is the general mechanism by 
which we can construct and initialize the tag structures. 

[0059] Figure 1 2 depicts operation of the present invention within an exemplary application of a buyer attempting to 
buy a particular shirt while speaking in a first language to a seller who speaks in a second language. The start indication 
block 500 indicates process block 504 is to be processed. At process block 504, the buyer speaks in a first language 
about a particular shirt. At process block 508, the buyer's speech is recognized and predetermined parts of the buyer's 
speech is determined via the local parser of the present invention at process block 512. 

[0060] Process block 51 6 determines the semantic portions of the buyer's speech via a global parser. Process block 
520 translates the determined semantic parts to a second language which is then vocalized at process block 524. At 
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process block 528, any response from the salesperson or buyer is processed according to the present invention. 
Processing terminates at process block 532. 

[0061 ] Figure 1 3 depicts the operational steps associated with the multi-pass architecture of the local parser of the 
present invention. Start indication block 550 indicates that process block 554 is to be executed wherein an input sen- 
tence is received. Process block 567 performs automatic speech recognition for the input sentence. 
[0062] Iteration block 566 performs for each grammar the following steps. Preferably, the processing for each gram- 
mar is performed substantially concurrently with the processing for a second grammar. Process block 570 utilizes a 
lexical filter on the input sentence using the grammar as selected by iteration block 566. 

[0063] Process block 574 generates a parse forest using the selected grammar, and process block 578 generates 
tags for the input sentence using confidence vectors from process block 557 and using dialogue context weights from 
process block 599 (if available from previous processing of the dialogue manager). It is to be understood, however, 
that the present invention is not limited to using context-related data at this stage in processing, but also includes 
utilizing no context information at this stage. 

[0064] Process block 582 generates a score for the tags that were generated at process block 578. Process block 
586 selects the N-best tags based upon the score generated by process block 582. Process block 590 generates the 
tag output, and iteration terminator block 594 repeats the process until each grammar has been utilized. 
[0065] If each grammar has been utilized for a particular pass, then decision block 598 inquires whether any additional 
tags have been generated. If additional tags have been generated, then processing continues at iteration block 566. 
If no additional tags were generated, then processing continues at process block 599. At process block 599, global 
parsing is performed, and then the dialog manager processing is performed wherein context weights are determined 
that could be used, if needed, in the next processing of an input sentence. Processing terminates at end block 602. 
[0066] While the invention has been described in its presently preferred form, it is to be understood that there are 
numerous applications and implementations for the present invention. Accordingly, the invention is capable of modifi- 
cation and changes without departing from the spirit of the invention as set forth in the appended claims. 

EXHIBIT A 

[0067] Input: Ascii-text string, s, containing a sequence of white-space seperated words, w h without any punctuation 
marks. The words are comprised of lower-case letters of English alphabet and the single-quote character, [note: no 
digits] 

where 



s=w 0 w^ ... w n 



w= [a-z] + 

[0068] Output: Ascii-text string, out, containing a sequence of white-space seperated words or tags without punc- 
tuation marks. 
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where out ■ 

wt as 

W ss 

t = 



10 



15 



20 



25 



w\ t 

fa-z'7+ 
dateTag \ 
timeTag j 
costTag j 
fllghtNumTag 
ci tyNameTag | 
airlineTag \ 
stopTag \ 
cla&sTag \ 
mealTag \ 
typeTag 



dateTag 



i 



month 



DATE Id] | 
PATE Id- d] 
DATE [< dl 
DATE [> d] 
DATE [~ d] j 
DATE [d] | | DATE [d] 
DATE [MINO] 



d ss dy/mo \ 



; plain date 
; date range 
; before date 
; alter date 
; around date 
; alternative dates 
; earliest date 

; date==day & 



30 



35 



40 



45 



50 



55 
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month 



X/mo 

dy =1|2|3| ~ | 
mo = 1|2| ~ 12 



31 



timeTag = TIME [ t] | 

TIME [ t> t] | 
TIME[< t] j 
TIME {> t] | 
TIME [~ t] | 
TIME [ t] | ) TIME [ t] 
TIME IMIN{)] 
TIME [MAX()I 



time=hrs:minutes 



t hrszmin 

hrs = X 1 2 1 3 | 

win = 01 1 02 | _ 59 



I 24 



costTag = 



c = 



yens 
dollars 



COSTtc] | 
COST [c-c] 
COST[< c] 
COST [> c] 
COST!- c] I 
COST [MIN(>] 
COST IMAX()] 

amount? \ 
ainoimt$ 



I 



amount = 1 2 1 3 I 



99999 



; any day of the 



; plain time 
; time range 
; before time 
; after time 
; around time 
; alternative times 
; earnest 
; latest 



; plain cost 

; cost range 

; under the amount 

; over the amount 

; around cost 

; cheapest 

; most expensive 

; amount in 
; amount in 



flightNuniTag = FNUM[frjum] 

fnum = X | 2 | 3 | | 9999 

cityNameTag = CITYCcityJWanie] 

cityName = ATLANTA | BALTIMORE BANGKOK | BOSTON 

CHICAGO ( DALLAS ( DENVER | HONG KONG | 
HOUSTON | LOS_ANGELES | L_A | MIAMI | 
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stops 



new_orleans | new_york | 
new_york_city| 

^ OAKLAND | OSAKA | PHILADELPHIA | SEOUL 

PITTSBURGH | SANJFRANCISCO | 
WASHINGTON | 

SINGAPORE | TOKYO | TORONTO | 
VANCOUVER | 

airllneTag = CITY [airlineName] 

airllneName = american_airlines | u_s_air . 

JAPAN_AIRLINES \ 

CONTINENTAL | 

UNITED | SINGAPORE_AIRLINES 

stopTag = STOP[> 0] | ;any#of 

STOP[= 0] ; non-stop 

classTag = CLAS Iclass] 

class = first ) business | economy 



mealTag = MEAL [meal] ; meal info 

weal s dinner | lunch | breakfast 

typeTag = TYPE [oneway] | TYPE [round trip] 



Claims 

1. A computer-implemented speech parsing method for processing an input phrase (118), comprising the steps of: 

(a) providing a plurality of grammars (140, 142) indicative of pre-determined topics; 

(b) generating a plurality of parse forests (150, 152) related to said input phrase (118) using said grammars; 

(c) associating tags (164, 166) with words in said input phrase (118) using said generated parse forests (150, 
152); 

(d) generating scores for said tags (164, 166) based upon attributes of said parse forests (150, 152); and 

(e) selecting tags (164, 166) for use as a parsed representation (180) of said input phrase (118) based upon 
said generated scores. 

2. The speech parsing method of Claim 1 further comprising the step of: 

performing said step (b) a plurality of iterations so that each iteration produces alternate parse forests. 

3. The speech parsing method of Claim 1 wherein said step (b) is performed substantially concurrently for each of 
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said grammars. 

4. The speech parsing method of Claim 3 further comprising the step of: 

performing said step (b) a plurality of iterations wherein each iteration produces alternate parse forests with 
respect to each of said grammars. 

5. The speech parsing method of Claim 1 further comprising the step of: 

generating scores for said tags based upon score based factors selected from the group consisting of number 
of terminals, gap size, depth, number of non terminals, and combinations thereof. 

6. The speech parsing method of Claim 5 further comprising the step of: 

weighting at least two of said factors differently. 

7. The speech parsing method of Claim 6 further comprising the step of: 

using context information to weigh at least two of said factors differently. 

8. The speech parsing method of Claim 7 further comprising the steps of: 

generating a word confidence vector for said input phrase substantially during speech recognition of said input 
phrase; and 

weighting at least two of said factors differently based upon said generated word confidence vector. 

9. The speech parsing method of Claim 7 or 8 further comprising the steps of: 

generating a request for information related to a pre-determined topic; 

generating dialogue context weights based upon said generated request for information; and 

weighting at least two of said factors differently based upon said generated dialogue context weights. 

10. The speech parsing method of Claim 7 further comprising the steps of; 

using said context information processor substantially in parallel to perform said step (b). 

11. The speech parsing method of Claim 1 further comprising the steps of: 

generating scores for said tags; and 

selecting N-best tags for use in said parsed representation based upon said generated scores. 

12. The speech parsing method of Claim 11 further comprising the steps of: 

performing said steps (b) and (c) a plurality of iterations; and 

using said selected N-best tags of a first iteration as input related to processing said steps (b) and (c) of a 
second iteration. 

13. The speech parsing method of Claim 1 wherein said tags are indicative of said topics of said grammars. 

14. The speech parsing method of Claim 1 wherein said input phrase is grammatically incorrect with respect to at least 
a portion of said input phrase, said method further comprising the steps of: 

generating a plurality of parse forests related to said grammatically incorrect input phrase using said grammars; 



13 



EP 1 043 711 B1 

associating tags with words in said grammatically incorrect input phrase using said generated parse forests; 
and 

using said tags associated with said words as a parsed representation of said grammatically incorrect input 
phrase. 

15. The speech parsing method of Claim 1 wherein said grammars are based upon left-right context-sensitive gram- 
mars. 

16. The speech parsing method of Claim 1 wherein said grammers are based upon left-right context-sensitive gram- 
mars and contain ambiguities. 

17. The speech parsing method of Claim 1 further comprising the steps of: 

filtering said input phrase via lexical filters; and 

generating said plurality of parse forests based upon said filtered input phrase. 

18. The speech parsing method of Claim 1 further comprising the step of: 

extracting semantic components of said input phrase based upon said tags that are associated with said words. 

19. The speech method of Claim 1 further comprising the step of: 

providing a global parser to extract said semantic components from said input phrase based upon said tags 
that are associated with said words. 

20. The speech parsing method of Claim 19 further comprising the step of: 

managing based upon said extracted semantic components the exchange of dialogue between a speech 
recogniser device and a user. 

21. The speech parsing method of Claim 19 further comprising the step of 

managing based upon said extracted semantic components the exchange of dialogue between two users 
who speak different languages. 

22. A computer-implemented speech parsing apparatus for processing an input phrase, comprising: 

means for providing a plurality of grammars (140, 142) indicative of pre-determined topics; 

a parse forest generator for generating a plurality of parse forests (1 50, 1 52) related to said input phrase (118) 
using said grammars; 

a tag generator for associating tags (164, 166) with words in said input phrase (118) using said generated 
parse forests (150, 152); 

a tag score generator for generating scores for said tags (1 64, 1 66) based upon attributes of said parse forests; 
and 

a tag selector for selecting tags for use as a parsed representation (180) of said input phrase (118) based 
upon said generated scores. 

23. The speech parsing apparatus of Claim 22 wherein said parse forest generator is executed a plurality of iterations 
such that each iteration produces alternate parse forests. 

24. The speech parsing apparatus of Claim 22 wherein said parse forest generator is executed a plurality of iterations 
such that each iteration produces alternate parse forests with a respect to each of said grammars. 
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25. The speech parsing apparatus of Claim 22 wherein said tag score generator generates scores for said tags based 
upon score based factors selected from the group consisting of number of terminals, gap size, depth, number of 
non-terminals, and combinations thereof. 

26. The speech parsing apparatus of Claim 25 wherein said tag score generator weights at least two of said factors 
differently. 

27. The speech parsing apparatus of Claim 26 wherein said tag score generator uses context information to weight 
at least two of said factors differently. 

28. The speech parsing apparatus of Claim 27 further comprising: 

a speech recognition module for performing speech recognition of said input phrase and for generating a word 
confidence vector for said input phrase substantially, 

said tag score generator weighting at least two of said factors differently based upon said generated word 
confidence vector. 

29. The speech parsing apparatus of Claim 27 or 28 further comprising: 

a dialogue manager for generating a request for information related to a pre-determined topic, said dialogue 
manager generating dialogue context weights based upon said generated request for information, said tag 
score generator weighting at least two of said factors differently based upon said generated dialogue context 
weights. 

30. The speech parsing apparatus of Claim 22 further comprising: 

a tag score generator for generating scores for said tags; and 

a tag selector for selecting N-best tags for use in said parsed representation based upon said generated scores. 

31 . The speech parsing apparatus of Claim 30 wherein said parsed forest generator and tag generator are executed 
a plurality of iterations, said selected N-best tags of a first iteration are used as input to said parse forest generator 
and said tag generator during a second iteration. 

32. The speech parsing apparatus of Claim 22 wherein said tags are indicative of said topics of said grammars. 

33. The speech parsing apparatus of Claim 22 wherein said input phrase is grammatically incorrect with respect to at 
least a portion of said input phrase, said parse forest generators generating a plurality of parse forests related to 
said grammatically incorrect input phrase using said grammars, said tag 

generator associating tags with words in said grammatically incorrect input phrase using said generated 
parse forests, said tags being associated with said words as a parsed representation of said grammatically incorrect 
input phrase. 

34. The speech parsing apparatus of Claim 22 wherein said grammars are based upon left-right context-sensitive 
grammars. 

35. The speech parsing apparatus of Claim 22 wherein said grammars are based upon left-right context-sensitive 
grammars and contain ambiguities. 

36. The speech parsing apparatus of Claim 22 further comprising: 

a lexical filter for filtering said input phrase, said parse forest generator generating said plurality of parse forests 
based upon said filtered input phrase. 

37. The speech parsing apparatus of Claim 22 further comprising: 

a semantic extractor for extracting semantic components of said input phrase based upon said tags that are 
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associated with said words. 

38. The speech parsing apparatus of Claim 37 further comprising: 

a global parser to extract said semantic components from said input phrase based upon said tags that are 
associated with said words. 

39. The speech parsing apparatus of Claim 38 further comprising: 

a dialogue manager for managing based upon said extracted semantic components the exchange of dialogue 
between a speech recogniser device and a user. 

40. The speech parsing apparatus of Claim 39 further comprising: 

a dialogue manager for managing based upon said extracted semantic components the exchange of dialogue 
between two users who speak different language. 



Patentanspruche 

1. Computerimplementiertes Sprachanalyseverfahren zum Verarbeiten einer Eingangsphrase (118), das folgende 
Schritte umfasst: 

(a) Bereitstellen einer Vielzahl von Grammatiken (140, 142), die fur vorbestimmte Themen indikativ sind; 

(b) Generieren einer Vielzahl von mit der Eingangsphrase (118) verwandten Parsewaldern (150, 152) unter 
Verwendung der Grammatiken; 

(c) Assoziieren von Etiketten (164, 166) mit Wortern in der Eingangsphrase (118) unter Verwendung der ge- 
nerierten Parsewalder (150, 152); 

(d) Generieren von Bewertungen fur die Etiketten (164, 166), basierend auf Attributen der Parsewalder (150, 
152); und 

(e) Auswahlen von Etiketten (164, 166) zur Verwendung als geparste Darstellung (180) der Eingangsphrase 
(118), basierend auf den generierten Bewertungen. 

2. Sprachanalyseverfahren nach Anspruch 1, das weiter folgenden Schritt umfasst: 

Ausfuhren des Schritts (b) uber eine Vielzahl von Iterationen, so dass jede Iteration andere Parsewalder pro- 
duziert. 

3. Sprachanalyseverfahren nach Anspruch 1, wobei der Schritt (b) im Wesentlichen gleichzeitig fur jede der Gram- 
matiken ausgefiihrt wird. 

4. Sprachanalyseverfahren nach Anspruch 3, das weiter folgenden Schritt umfasst: 

Ausfuhren des Schritts (b) fur eine Vielzahl von Iterationen, wobei jede Iteration andere Parsewalder bezuglich 
jeder der Grammatiken produziert. 

5. Sprachanalyseverfahren nach Anspruch 1, das weiter folgenden Schritt umfasst: 

Generieren von Bewertungen fur die Etiketten, basierend auf bewertungsbasierten Faktoren, die aus der An- 
zahl von Endstellen, Luckengr68e, Tiefe, Anzahl von Nicht-Endstellen und Kombinationen derselben beste- 
henden Gruppe ausgewahlt werden. 

6. Sprachanalyseverfahren nach Anspruch 5, das weiter folgenden Schritt umfasst: 
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unterschiedliches Gewichten von mindestens zwei der Faktoren. 

7. Sprachanalyseverfahren nach Anspruch 6, das weiter folgenden Schritt umfasst: 

5 Verwenden von Kontextinformationen zum unterschiedlichen Gewichten von mindestens zwei der Faktoren. 

8. Sprachanalyseverfahren nach Anspruch 7, das weiter folgende Schritte umfasst: 

Generieren eines Wortvertrauensvektors fur die Eingangsphrase, im Wesentlichen wahrend der Spracher- 
10 kennung der Eingangsphrase; und 

unterschiedliches Gewichten von mindestens zwei der Faktoren, basierend auf dem generierten Wortvertrau- 
ensvektor. 

15 9. Sprachanalyseverfahren nach Anspruch 7 oder 8, das weiter folgende Schritte umfasst: 

Generieren einer Anforderung nach Informationen, zusammenhangend mit einem vorbestimmten Thema; 
Generieren von Dialogkontextgewichten, basierend auf der generierten Anforderung nach Informationen; und 

20 

unterschiedliches Gewichten von mindestens zwei der Faktoren, basierend auf den generierten Dialogkon- 
textgewichten. 

10. Sprachanalyseverfahren nach Anspruch 7, das weiter folgende Schritte umfasst: 

25 

Verwenden des Kontextinformationsprozessors, im Wesentlichen parallel, urn den Schritt (b) auszufuhren. 

11. Sprachanalyseverfahren nach Anspruch 1, das weiter folgende Schritte umfasst: 

30 Generieren von Bewertungen fur die Etiketten; und 

Auswahlen von N-besten Etiketten fur die Verwendung in der geparsten Darstellung, basierend auf den ge- 
nerierten Bewertungen. 

35 12. Sprachanalyseverfahren nach Anspruch 11, das weiter folgende Schritte umfasst: 

Ausfuhren der Schritte (b) und (c) uber eine Vielzahl von Iterationen; und 

Verwenden der ausgewahlten N-besten Etiketten einer ersten Iteration als Eingabe, zusammenhangend mit 
M der Verarbeitung der Schritte (b) und (c) einer zweiten Iteration. 

13. Sprachanalyseverfahren nach Anspruch 1, wobei die Etiketten indikativ fur die Themen der Grammatiken sind. 

14. Sprachanalyseverfahren nach Anspruch 1, wobei die Eingangsphrase bezuglich mindestens eines Teils der Ein- 
45 gangsphrase grammatikalisch inkorrekt ist, wobei das Verfahren weiter folgende Schritte umfasst: 

Generieren, unter Verwendung der Grammatiken, einer Vielzahl von Parsewaldern, zusammenhangend mit 
der grammatikalisch inkorrekten Eingangsphrase; 

50 Assoziieren von Etiketten mit Wortern in der grammatikalisch inkorrekten Phrase unter Verwendung der ge- 

nerierten Parsewalder; und 

Verwenden der mit den Wortern assoziierten Etiketten als analysierte Darstellung der grammatikalisch inkor- 
rekten Eingangsphrase. 

55 

15. Sprachanalyseverfahren nach Anspruch 1, wobei die Grammatiken auf links-rechts kontextsensitiven Grammati- 
ken basieren. 
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16. Sprachanalyseverfahren nach Anspruch 1, wobei die Grammatiken auf links-rechts kontextsensitiven Grammati- 
ken basieren und Mehrdeutigkeiten enthalten. 

17. Sprachanalyseverfahren nach Anspruch 1, das weiter folgende Schritte umfasst; 

Filtem der Eingangsphrase uber lexikalische Filter; und 

Generieren der Vielzahl von Parsewaldem, basierend auf der gefilterten Eingangsphrase. 

18. Sprachanalyseverfahren nach Anspruch 1, das weiter folgenden Schritt umfasst: 

Ausziehen semantischer Komponenten der Eingangsphrase, basierend auf den Etiketten, die mit den Wortern 
assoziiert sind. 

19. Sprachanalyseverfahren nach Anspruch 1, das weiter folgenden Schritt umfasst: 

Bereitstellen eines globalen Parsers zum Ausziehen der semantischen Komponenten aus der Eingangsphra- 
se, basierend auf den Etiketten, die mit den Wortern assoziiert sind. 

20. Sprachanalyseverfahren nach Anspruch 19, das weiter folgenden Schritt umfasst: 

Verwalten, basierend auf den ausgezogenen semantischen Komponenten, des Austauschs von Dialog zwi- 
schen einer Spracherkennungsvorrichtung und einem Anwender. 

21. Sprachanalyseverfahren nach Anspruch 19, das weiter folgenden Schritt umfasst: 

Verwalten, basierend auf den ausgezogenen semantischen Komponenten, des Austauschs von Dialog zwi- 
schen zwei Anwendem, die verschiedene Sprachen sprechen. 

22. Computerimplementierte Sprachanalysevorrichtung zum Verarbeiten einer Eingangsphrase, die folgendes um- 
fasst: 

Mittel zum Bereitstellen einer Vielzahl von Grammatiken (140, 142), die fur vorbestimmte Themen indikativ 
sind; 

einen Parsewaldgenerator zum Generieren einer Vielzahl von Parsewaldem (1 50, 1 52), zusammenhangend 
mit der Eingangsphrase (118) unter Verwendung der Grammatiken; 

einen Etikettengenerator zum Assoziieren von Etiketten (164, 166) mit Wortern in der Eingangsphrase (118), 
unter Verwendung der generierten Parsewalder (150, 152); 

einen Etikettenbewertungsgenerator zum Generieren von Bewertungen fur die Etiketten (164, 166), basierend 
auf Attributen der Parsewalder; und 

einen Etiketten aus wan ler zum Auswahlen von Etiketten zur Verwendung als geparste Darstellung (180) der 
Eingangsphrase (118), basierend auf den generierten Bewertungen. 

23. Sprachanalysevorrichtung nach Anspruch 22, wobei der Parsewaldgenerator uber eine Vielzahl von Iterationen 
ausgefuhrt wird, so dass jede Iteration andere Parsewalder produziert. 

24. Sprachanalysevorrichtung nach Anspruch 22, wobei der Parsewaldgenerator uber eine Vielzahl von Iterationen 
ausgefuhrt wird, so dass jede Iteration andere Parsewalder bezuglich jeder der Grammatiken produziert. 

25. Sprachanalysevorrichtung nach Anspruch 22, wobei der Etikettenbewertungsgenerator, basierend auf bewer- 
tungsbasierten Faktoren, die aus der Anzahl von Endstellen, LuckengroGe, Tiefe, Anzahl von Nicht-Endstellen 
und Kombinationen derselben bestehenden Gruppe ausgewahlt werden, Bewertungen fur die Etiketten generiert. 

26. Sprachanalysevorrichtung nach Anspruch 25, wobei der Etikettenbewertungsgenerator mindestens zwei der Fak- 
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toren unterschiedlich gewichtet. 

27. Sprachanalysevorrichtung nach Anspruch 26, wobei der Etikettenbewertungsgenerator Kontextinformationen 
nutzt, urn mindestens zwei der Faktoren unterschiedlich zu gewichten. 

5 

28. Sprachanalysevorrichtung nach Anspruch 27, der weiter folgendes umfasst: 

ein Spracherkennungsmodul zum Ausfuhren von Spracherkennung der Eingabephrase und zum Generieren 
eines Wortvertrauensvektors fur die Eingangsphrase, wobei im Wesentlichen der Etikettenbewertungsgene- 
10 rator mindestens zwei der Faktoren, basierend auf dem generierten Wortvertrauensvektor unterschiedlich 

gewichtet. 

29. Sprachanalysevorrichtung nach Anspruch 27 oder 28, der weiter folgendes umfasst: 

15 einen Dialogverwalter zum Generieren einer Anforderung nach Informationen, zusammenhangend mit einem 

vorbestimmten Thema, wobei der Dialogverwalter Dialogkontextgewichte, basierend auf der generierten An- 
forderung nach Informationen generiert, wobei der Etikettenbewertungsgenerator mindestens zwei der Fak- 
toren, basierend auf den generierten Dialogkontextgewichten unterschiedlich gewichtet. 

20 30. Sprachanalysevorrichtung nach Anspruch 22, der weiter folgendes umfasst: 

einen Etikettenbewertungsgenerator zum Generieren von Bewertungen fur die Etiketten; und 

einen Etikettenauswahler zum Auswahlen von N-besten Etiketten fur die Verwendung in der geparsten Dar- 
25 stellung, basierend auf den generierten Bewertungen. 

31 . Sprachanalysevorrichtung nach Anspruch 30, wobei der Parsewaldgenerator und der Etikettengenerator uber eine 
Vielzahl von Iterationen ausgefuhrt werden, wobei die ausgewahlten N-besten Etiketten einer ersten Iteration als 
Eingabe fur den Parsewaldgenerator und den Etikettengenerator wahrend einer zweiten Iteration verwendet wer- 

30 den. 

32. Sprachanalysevorrichtung nach Anspruch 22, wobei die Etiketten indikativ fur die Themen der Grammatiken sind. 

33. Sprachanalysevorrichtung nach Anspruch 22, wobei die Eingangsphrase bezuglich mindestens eines Teils der 
35 Eingangsphrase grammatikalisch inkorrekt ist, wobei die Parsewaldgeneratoren unter Verwendung der Gramma- 
tiken eine Vielzahl von Parsewaldern, zusammenhangend mit der grammatikalisch inkorrekten Eingangsphrase 
generieren, wobei der Etikettengenerator unter Verwendung der generierten Parsewalder Etiketten mit Wortern 
in der grammatikalisch inkorrekten Phrase assoziiert, wobei die Etiketten als geparste Darstellung der grammati- 
kalisch inkorrekten Eingangsphrase mit den Wortern assoziiert werden. 

40 

34. Sprachanalysevorrichtung nach Anspruch 22, wobei die Grammatiken auf links-rechts kontextsensitiven Gram- 
matiken basieren. 

35. Sprachanalysevorrichtung nach Anspruch 22, wobei die Grammatiken auf links-rechts kontextsensitiven Gram- 
45 matiken basieren und Mehrdeutigkeiten enthalten. 

36. Sprachanalysevorrichtung nach Anspruch 22, die weiter folgendes umfasst: 

einen lexikalischen Filter zum Filtern der Eingangsphrase, wobei der Parsewaldgenerator die Vielzahl von 
50 Parsewaldern, basierend auf der gefilterten Eingangsphrase generiert. 

37. Sprachanalysevorrichtung nach Anspruch 22, die weiter folgendes umfasst: 

einen semantischen Extraktor zum Ausziehen semantischer Komponenten aus der Eingangsphrase, basie- 
55 rend auf den Etiketten, die mit den Wortern assoziiert sind. 

38. Sprachanalysevorrichtung nach Anspruch 37, die weiter folgendes umfasst: 
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einen globalen Parser zum Ausziehen der semantischen Komponenten aus der Eingangsphrase, basierend 
auf den Etiketten, die mit den Wortern assoziiert sind. 

39. Sprachanalysevorrichtung nach Anspruch 38, die weiter folgendes umfasst: 

einen Dialogverwalter zum Verwalten, basierend auf den ausgezogenen semantischen Komponenten, des 
Austauschs von Dialog zwischen einer Spracherkennungsvorrichtung und einem Anwender. 

40. Sprachanalysevorrichtung nach Anspruch 39, die weiter folgendes umfasst: 

einen Dialogverwalter zum Verwalten, basierend auf den ausgezogenen semantischen Komponenten, des 
Austauschs von Dialog zwischen zwei Anwendem, die verschiedene Sprachen sprechen. 



Revendications 

1. Procede d'analyse de la parole mis en oeuvre par ordinateur, servant a traiter une phrase a I'entree (118), com- 
ponent les etapes qui consistent 

(a) a fournir une plurality de grammaires (140, 142) indiquant des sujets predetermines; 

(b) a produire une pluralite de forets d'analyse (150, 152) liees a cette phrase a I'entree (118) en utilisant ces 
grammaires; 

(c) a associer des etiquettes (164, 166) avec des mots dans la phrase a I'entree (118) en utilisant ces forets 
d'analyse produites (150, 152); 

(d) a produire des scores pour ces etiquettes (1 64, 1 66) sur la base des attributs de ces forets d'analyse (1 50, 
152); et 

(e) a selectionner des etiquettes (1 64, 1 66) qui seront utilisees comme representation analysee (1 80) de cette 
phrase a I'entree (118) sur la base de ces scores produits. 

2. Procede d'analyse de la parole selon la revendication 1, comportant par ailleurs I'etape qui consiste 

a effectuer I'etepe (b) avec une pluralite d'iterations de sorte que chaque iteration produit d'autres forets 
d'analyse. 

3. Procede d'analyse de la parole selon la revendication 1 , caracterise en ce que I'etape (b) s'effectue de maniere 
essentiellement simultanee pour chacune des grammaires. 

4. Procede d'analyse de la parole selon la revendication 3, comportant par ailleurs I'etape qui consiste 

a effectuer I'etape (b) avec une pluralite d'iterations, caracterise en ce que chaque iteration produit d'autres 
forets d'analyse relativement a chacune des grammaires. 

5. Procede d'analyse de la parole selon la revendication 1 , comportant par ailleurs I'etape qui consiste 

a produire des scores pour ces etiquettes sur la base de certains facteurs fonction des scores qui sont 
selectionnes dans le groupe compose du nombre de terminaux, de la taille des intervalles, de la profondeur, du 
nombre de non-terminaux, et de combinaisons de ceux-ci. 

6. Procede d'analyse de la parole selon la revendication 5, comportant par ailleurs I'etape qui consiste 

a ponderer au moins deux de ces facteurs de maniere differente. 

7. Procede d'analyse de la parole selon la revendication 6, comportant par ailleurs I'etape qui consiste 

a utiliser I'information de contexte pour ponderer au moins deux de ces facteurs de maniere differente. 

8. Procede d'analyse de la parole selon la revendication 7, comportant par ailleurs les etapes qui consistent 

a produire un vecteur de confiance dans les mots pour la phrase a I'entree essentiellement pendant la re- 
connaissance vocale de la phrase a I'entree; et 
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a ponderer au moins deux de ces facteurs de maniere differente sur la base du vecteur de confiance dans 
les mots qui a ete produit. 

9. Procede d'analyse de la parole selon la revendication 7 ou la revendication 8, comportant par ailleurs les etapes 
qui consistent 

a produire une demande d'information concernant un sujet predetermine; 

a produire des poids pour le contexte de dialogue sur la base de la demande d'information produite; et 
a ponderer au moins deux de ces facteurs de maniere differente sur la base des poids produits pour le 
contexte de dialogue. 

10. Procede d'analyse de la parole selon la revendication 7, comportant par ailleurs I'etape qui consiste 

a utiliser le processeur d'information de contexte essentiellement en parallele pour effectuer I'etape (b). 

11. Procede d'analyse de la parole selon la revendication 1 , comportant par ailleurs les etapes qui consistent 

a produire des scores pour les etiquettes; et 

a selectionner les N meilleures etiquettes qui vont etre utilisees dans la representation analysee sur la base 
des scores produits. 

12. Procede d'analyse de la parole selon la revendication 11, comportant par ailleurs les etapes qui consistent 

a effectuer les etapes (b) et (c) avec une pluralite d'iterations; et 

a utiliser les N meilleures etiquettes selectionnees d'une premiere iteration comme elements d'entree lies 
au traitement des etapes (b) et (c) d'une deuxieme iteration. 

13. Procede d'analyse de la parole selon la revendication 1 , caracterise en ce que les etiquettes indiquent les sujets 
des grammaires. 

14. Procede d'analyse de la parole selon la revendication 1, caracterise en ce que la phrase a I'entree est gramma- 
ticalement incorrecte en ce qui concerne au moins une partie de cette phrase a I'entree, le procede comportant 
par ailleurs les etapes qui consistent 

a produire une pluralite de forets d'analyse liees a la phrase a I'entree, qui est grammaticalement incorrecte, 
en utilisant des grammaires; 

a associer des etiquettes avec des mots dans la phrase a I'entree, qui est grammaticalement incorrecte, en 
utilisant les forets d'analyse produites; et 

a utiliser les etiquettes associees a ces mots en tant que representation analysee de la phrase a I'entree qui 
est grammaticalement incorrecte. 

15. Procede d'analyse de la parole selon la revendication 1, caracterise en ce que ces grammaires se basent sur 
des grammaires sensibles au contexte gauche-droite. 

16. Procede d'analyse de la parole selon la revendication 1, caracterise en ce que ces grammaires se basent sur 
des grammaires sensibles au contexte gauche-droite et contiennent des ambiguites. 

17. Procede d'analyse de la parole selon la revendication 1 , comportant par ailleurs les etapes qui consistent 

a filtrer la phrase a I'entree au moyen de filtres lexicaux; et 

a produire la pluralite de forets d'analyse sur la base de la phrase a I'entree filtree. 

18. Procede d'analyse de la parole selon la revendication 1, comportant par ailleurs I'etape qui consiste 

a extraire des composantes semantiques de la phrase a I'entree sur la base des etiquettes associees a ces 

mots. 

19. Procede d'analyse de la parole selon la revendication 1, comportant par ailleurs I'etape qui consiste 

a fournir un analyseur global pour extraire les composantes semantiques de la phrase a I'entree sur la base 
des etiquettes associees a ces mots. 

20. Procede d'analyse de la parole selon la revendication 19, comportant par ailleurs I'etape qui consiste 

a gerer, sur la base des composantes semantiques extraites, I'echange de dialogue entre un dispositif de 
reconnaissance de la parole et un utilisateur. 
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21. Procede d'analyse de la parole selon la revendication 19, comportant par ailleurs I'etape qui consiste 

a gerer, sur la base des composantes semantiques extraites, I'echange de dialogue entre deux utilisateurs 
qui parlent des langues differentes. 

22. Dispositif d'analyse de la parole mis en oeuvre par ordinateur, servant a traiter une phrase a I'entree, comportant 

des moyens de fournir une pluralite de grammaires (140, 142) indiquant des sujets predetermines; 

un generateur de forets d'analyse servant a produire une pluralite de forets d'analyse (150, 152) liees a la 
phrase a I'entree (118) en utilisant ces grammaires; 

un generateur d'etiquettes servant a associer des etiquettes (164, 1 66) a des mots dans la phrase a I'entree 
(118) en utilisant les forets d'analyse produites (150, 152); 

un generateur de scores pour etiquettes servant a produire des scores pour ces etiquettes (1 64, 1 66) sur la 
base des attributs des forets d'analyse; et 

un selecteur d'etiquettes servant a selectionner des etiquettes qui seront utilisees comme representation 
analysee (180) de la phrase a I'entree (118) sur la base des scores produits. 

23. Dispositif d'analyse de la parole selon la revendication 22, caracterise en ce que le generateur de forets d'analyse 
est execute avec une pluralite d'iterations de telle sorte que chaque iteration produit d'autres forets d'analyse. 

24. Dispositif d'analyse de la parole selon la revendication 22, caracterise en ce que le generateur de forets est 
execute avec une pluralite d'iterations de telle sorte que chaque iteration produit d'autres forets d'analyse relati- 
vement a chacune des grammaires. 

25. Dispositif d'analyse de la parole selon la revendication 22, caracterise en ce que le generateur de scores pour 
etiquettes produit des scores pour les etiquettes sur la base de certains facteurs fonction des scores qui sont 
selectionnes dans le groupe compose du nombre de terminaux, de la taille des intervalles, de la profondeur, du 
nombre de non terminaux, et de combinaisons de ceux-ci. 

26. Dispositif d'analyse de la parole selon la revendication 25, caracterise en ce que le generateur de scores pour 
etiquettes pondere au moins deux des facteurs de maniere differente. 

27. Dispositif d'analyse de la parole selon la revendication 26, caracterise en ce que le generateur de scores pour 
etiquettes se sert de I'information de contexte pour ponderer au moins deux des facteurs de maniere differente. 

28. Dispositif d'analyse de la parole selon la revendication 27, comportant par ailleurs 

un module de reconnaissance de la parole pour assurer la reconnaissance vocale de la phrase a I'entree et 
pour produire un vecteur de confiance dans les mots pour la phrase a I'entree essentiellement, 

le generateur de scores pour etiquettes assurant la ponderation d'au moins deux des facteurs de maniere 
differente sur la base du vecteur de confiance dans les mots qui est produit. 

29. Dispositif d'analyse de la parole selon la revendication 27 ou la revendication 28 comportant par ailleurs 

un gestionnaire de dialogue qui produit une demande d'information concernant un sujet predetermine, ce 
gestionnaire de dialogue produisant des poids pour le contexte de dialogue sur la base de la demande d'information 
produite, le generateur de scores pour etiquettes assurant la ponderation d'au moins deux des facteurs de maniere 
differente sur la base des poids produits pour le contexte de dialogue. 

30. Dispositif d'analyse de la parole selon la revendication 22, comportant par ailleurs 

un generateur de scores pour etiquettes servant a produire des scores pour les etiquettes; et 
un selecteur d'etiquettes servant a selectionner les N meilleures etiquettes qui seront utilisees dans la re- 
presentation analysee sur la base des scores produits. 

31 . Dispositif d'analyse de la parole selon la revendication 30, caracterise en ce que le generateur de forets d'analyse 
et le generateur d'etiquettes sont executes avec une pluralite d'iterations, les N meilleures etiquettes selectionnees 
d'une premiere iteration etant utilisees comme elements d'entree pour le generateur de forets d'analyse et le 
generateur d'etiquettes au cours d'une deuxieme iteration. 

32. Dispositif d'analyse de la parole selon la revendication 22, caracterise en ce que les etiquettes indiquent les 
sujets des grammaires. 
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33. Dispositif d'analyse de la parole selon la revendication 22, caracterise en ce que la phrase a I'entree est gram- 
maticalement incorrecte en ce qui concerne au moins une partie de cette phrase a I'entree, les generateurs de 
forets d'analyse produisant une pluralite de forets d'analyse liees a la phrase a I'entree, qui est grammaticalement 
incorrecte, en utilisant ces grammaires, et le generateur d'etiquettes qui associe les etiquettes avec les mots dans 
cette phrase a I'entree, qui est grammaticalement incorrecte, utilisant ces forets d'analyse produites, ces etiquettes 
etant associees aux mots en tant que representation analysee de la phrase a I'entree qui est grammaticalement 
incorrecte. 

34. Dispositif d'analyse de la parole selon la revendication 22, caracterise en ce que ces grammaires se basent sur 
des grammaires sensibles au contexte gauche-droite. 

35. Dispositif d'analyse de la parole selon la revendication 22, caracterise en ce que ces grammaires se basent sur 
des grammaires sensibles au contexte gauche-droite et contiennent des ambiguTtes. 

36. Dispositif d'analyse de la parole selon la revendication 22, comportant par ailleurs 

un filtre lexical servant a filtrer la phrase a I'entree, le generateur de forets d'analyse produisant la pluralite 
de forets d'analyse sur la base de la phrase a I'entree filtree. 

37. Dispositif d'analyse de la parole selon la revendication 22, comportant par ailleurs 

un extracteur semantique servant a extraire les composantes semantiques de la phrase a I'entree sur la 
base des etiquettes qui sont associees a ces mots. 

38. Dispositif d'analyse de la parole selon la revendication 37, comportant par ailleurs 

un analyseur global servant a extraire les composantes semantiques de la phrase a I'entree sur la base des 
etiquettes qui sont associees a ces mots. 

39. Dispositif d'analyse de la parole selon la revendication 38, comportant par ailleurs 

un gestionnaire de dialogue servant a assurer la gestion, sur la base des composantes semantiques extraites, 
de I'echange de dialogue entre un dispositif de reconnaissance de la parole et un utilisateur. 

40. Dispositif d'analyse de la parole selon la revendication 39, comportant par ailleurs 

un gestionnaire de dialogue servant a assurer la gestion, sur la base des composantes semantiques extraites, 
de I'echange de dialogue entre deux utilisateurs qui parlent des langues differentes. 
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